Improved techniques for the identification of pseudogenes

نویسندگان

  • Lachlan James M. Coin
  • Richard Durbin
چکیده

MOTIVATION Pseudogenes are the remnants of genomic sequences of genes which are no longer functional. They are frequent in most eukaryotic genomes, and an important resource for comparative genomics. However, pseudogenes are often mis-annotated as functional genes in sequence databases. Current methods for identifying pseudogenes include methods which rely on the presence of stop codons and frameshifts, as well as methods based on the ratio of non-silent to silent nucleotide substitution rates (dN/dS). A recent survey concluded that 50% of human pseudogenes have no detectable truncation in their pseudo-coding regions, indicating that the former methods lack sensitivity. The latter methods have been used to find sets of genes enriched for pseudogenes, but are not specific enough to accurately separate pseudogenes from expressed genes. RESULTS We introduce a program called pseudogene inference from loss of constraint (PSILC) which incorporates novel methods for separating pseudogenes from functional genes. The methods calculate the log-odds score that evolution along the final branch of the gene tree to the query gene has been according to the following constraints: A neutral nucleotide model compared to a Pfam domain encoding model (PSILC(nuc/dom)); A protein coding model compared to a Pfam domain encoding model (PSILC(prot/dom)). Using the manual annotation of human chromosome 6, we show that both these methods result in a more accurate classification of pseudogenes than dN/dS when a Pfam domain alignment is available. AVAILABILITY PSILC is available from http://www.sanger.ac.uk/Software/PSILC

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IIR System Identification Using Improved Harmony Search Algorithm with Chaos

Due to the fact that the error surface of adaptive infinite impulse response (IIR) systems is generally nonlinear and multimodal, the conventional derivative based techniques fail when used in adaptive identification of such systems. In this case, global optimization techniques are required in order to avoid the local minima. Harmony search (HS), a musical inspired metaheuristic, is a recently ...

متن کامل

Molecular Methods for Bacterial Strain Typing

ABSTRACT        Typing of bacteria is an important part of epidemiological studies on nosocomial infections. Bacterial identification methods have dramatically improved in recent years, which is mainly due to advancements in the field of molecular biotechnology. In many cases, molecular techniques have replaced phenotypic typing methods. Currently, a wide r...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

Systematic identification of pseudogenes through whole genome expression evidence profiling

The identification of pseudogenes is an integral and significant part of the genome annotation because of their abundance and their impact on the experimental analysis of functional genes. Most of the computational annotation systems are not optimized for systematic pseudogene recognition, often annotating pseudogenes as functional genes, and users then propagate these errors to subsequent anal...

متن کامل

ارزیابی اثر بخشی روش های تشخیص برای شناسایی خطرهای موجود در صنعت

Background and Aim: The first step in establishing a safety system is hazard identification.  If this is not done properly, the subsequent steps steps will not be done effectively either. Since any given identification technique often targets the hazards of one or two of the main elements of a safety system, it is not possible to identify all hazards by a single technique Materials and Methods...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2004